Part 1: Foundation & Data Validation
Building Trust Through Systematic QA/QC
Why EDA is Not Optional
Before diving into the technical details, let’s address a fundamental question: Why do mining projects fail?
The answer often lies in the quality of the foundational data. Research shows that up to 70% of errors in resource estimation stem from inadequate data validation and EDA processes.
The GIGO Principle
Garbage In, Garbage Out (GIGO) - This principle is fundamental to all data analysis work. Poor quality data will always produce poor models, leading to:
- Inaccurate resource estimates
- Failed mine plans
- Investor confidence loss
- Regulatory non-compliance
- Millions in losses
In mining, we don’t get second chances. The quality of our EDA determines whether we build a mine or lose millions in poor decisions.
EDA as an Industry Standard
EDA is not just “best practice” - it’s a mandatory requirement for professional resource estimation.
JORC Code Compliance
The JORC Code requires that all resource reports be:
- Transparent: Methods and data quality must be clearly documented
- Material: All relevant information affecting value must be disclosed
- Competent: Prepared by qualified professionals
EDA directly supports these requirements by:
- Documenting data quality and limitations
- Identifying material data issues
- Providing evidence for geological interpretations
Competent Person Responsibilities
As a Competent Person (CP), thorough EDA is part of your due diligence. You are responsible for:
- Verifying data integrity
- Documenting QA/QC procedures
- Ensuring estimation assumptions are data-supported
- Defending your resource model to auditors and regulators
The 4 Pillar EDA Framework
This series follows a systematic 4-pillar approach to EDA:
Each pillar builds upon the previous, creating a comprehensive understanding of your dataset.
Pillar 1: Data Validation & Integrity
The first pillar is the foundation of all subsequent analysis. Without proper data validation, all downstream work becomes meaningless.
What We Check
A comprehensive data validation workflow includes:
- File Integration Checks
- Collar file completeness
- Assay file completeness
- Lithology file completeness
- Cross-file consistency
- Missing Data Detection
- Collars without assay data
- Assays without collar coordinates
- Lithology gaps
- Interval Validation
- Overlapping intervals
- Gaps in sampling
- Depth consistency
- Geometric Validation
- Data above/below topography
- Survey data quality
- Coordinate system consistency
One bad data point can invalidate an entire block model if not caught early. Data integrity issues compound through every step of the modeling process.
Practical Implementation with GeoDataViz
Let’s see how these validation checks are implemented using real drilling data.
Step 1: Load Required Libraries
Kode
library(dplyr)
library(tidyr)
library(ggplot2)
library(DT)
library(plotly)
library(janitor)Step 2: Create Sample Data
For this demonstration, we’ll create simulated drilling data that mimics real geological scenarios:
Kode
# Create simulated collar data
set.seed(123)
n_holes <- 50
collar <- data.frame(
hole_id = paste0("DDH", sprintf("%03d", 1:n_holes)),
x = runif(n_holes, 500000, 501000),
y = runif(n_holes, 9000000, 9001000),
rl = runif(n_holes, 100, 200)
)
# Create simulated assay data (multiple intervals per hole)
assay_list <- lapply(collar$hole_id, function(hid) {
n_intervals <- sample(15:25, 1)
depths <- seq(0, by = 2, length.out = n_intervals)
data.frame(
hole_id = hid,
from = depths[-length(depths)],
to = depths[-1],
au_ppm = pmax(0, rnorm(n_intervals - 1, mean = 1.5, sd = 2)),
ag_ppm = pmax(0, rnorm(n_intervals - 1, mean = 15, sd = 20)),
cu_pct = pmax(0, rnorm(n_intervals - 1, mean = 0.5, sd = 0.8))
)
})
assay <- do.call(rbind, assay_list)
# Create simulated lithology data
litho_codes <- c("Andesite", "Diorite", "Mineralized_Zone", "Altered_Volcanics")
lithology_list <- lapply(collar$hole_id, function(hid) {
n_litho <- sample(4:8, 1)
depths <- sort(c(0, sample(5:40, n_litho - 1), 50))
data.frame(
hole_id = hid,
from = depths[-length(depths)],
to = depths[-1],
lithology = sample(litho_codes, n_litho, replace = TRUE)
)
})
lithology <- do.call(rbind, lithology_list)
# Clean names
collar <- janitor::clean_names(collar)
assay <- janitor::clean_names(assay)
lithology <- janitor::clean_names(lithology)
# Display structure
cat("Collar records:", nrow(collar), "\n")Collar records: 50
Kode
cat("Assay records:", nrow(assay), "\n")Assay records: 981
Kode
cat("Lithology records:", nrow(lithology), "\n")Lithology records: 309
This is simulated data designed to demonstrate EDA workflows. In practice, you would load your own drilling data from CSV files or databases.
Step 3: File Record Count Validation
The first check: do we have data in all files?
Kode
file_counts <- data.frame(
File = c("Collar", "Assay", "Lithology"),
Records = c(nrow(collar), nrow(assay), nrow(lithology))
)
datatable(file_counts,
options = list(dom = 't'),
caption = "Table 1: File Record Counts")All three files should have records. Empty files indicate data loading issues that must be resolved before proceeding.
Step 4: Cross-File Consistency Checks
Check 1: Collars Missing Assay Data
Kode
# Identify collars without assay data
missing_assay <- anti_join(
collar %>% distinct(hole_id),
assay %>% distinct(hole_id),
by = "hole_id"
)
if(nrow(missing_assay) > 0) {
datatable(missing_assay,
caption = "Table 2: Collars Missing Assay Data",
options = list(pageLength = 5))
} else {
cat("✓ All collars have corresponding assay data.\n")
}✓ All collars have corresponding assay data.
Check 2: Assays Missing Collar Data
Kode
# Identify assays without collar coordinates
missing_collar <- anti_join(
assay %>% distinct(hole_id),
collar %>% distinct(hole_id),
by = "hole_id"
)
if(nrow(missing_collar) > 0) {
datatable(missing_collar,
caption = "Table 3: Assays Missing Collar Data",
options = list(pageLength = 5))
} else {
cat("✓ All assays have corresponding collar coordinates.\n")
}✓ All assays have corresponding collar coordinates.
Mismatches often result from:
- Typos in hole IDs (e.g., “DDH001” vs “DDH-001”)
- Incomplete data transfers
- Holes logged but not yet assayed
- Data entry errors
Step 5: Interval Validation
One of the most critical checks: ensuring assay intervals are continuous without gaps or overlaps.
Kode
# Check for interval errors (gaps/overlaps)
interval_errors <- assay %>%
arrange(hole_id, from) %>%
group_by(hole_id) %>%
mutate(
prev_to = lag(to),
has_error = !is.na(prev_to) & (from != prev_to)
) %>%
ungroup() %>%
filter(has_error) %>%
select(hole_id, prev_to, from, to)
if(nrow(interval_errors) > 0) {
datatable(interval_errors,
caption = "Table 4: Interval Errors (Gaps/Overlaps)",
options = list(pageLength = 10, scrollX = TRUE)) %>%
formatStyle('from', backgroundColor = '#ffebee') %>%
formatStyle('prev_to', backgroundColor = '#fff9c4')
} else {
cat("✓ No interval gaps or overlaps detected.\n")
}✓ No interval gaps or overlaps detected.
Interval errors can cause:
- Incorrect composite calculations
- Grade dilution or concentration artifacts
- Inaccurate tonnage estimates
- Biased variography
Visualization: Interval Error Example
Kode
# Create example visualization if errors exist
if(nrow(interval_errors) > 0) {
# Take first hole with errors as example
example_hole <- interval_errors$hole_id[1]
example_data <- assay %>%
filter(hole_id == example_hole) %>%
arrange(from) %>%
head(10)
ggplot(example_data, aes(y = from, yend = to)) +
geom_segment(aes(x = 0, xend = 1), size = 8, color = "steelblue", alpha = 0.7) +
geom_text(aes(x = 0.5, y = (from + to)/2, label = paste0(from, "-", to)),
color = "white", fontface = "bold", size = 3) +
scale_y_reverse() +
coord_flip() +
labs(
title = paste("Interval Visualization:", example_hole),
subtitle = "Look for gaps (white space) or overlaps (segments touching)",
x = NULL,
y = "Depth (m)"
) +
theme_minimal() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major.y = element_blank()
)
}Data Validation Summary
Key Metrics Dashboard
Kode
# Create validation summary
validation_summary <- data.frame(
Check = c(
"Total Collars",
"Total Assay Intervals",
"Total Lithology Intervals",
"Collars Missing Assays",
"Assays Missing Collars",
"Interval Errors"
),
Count = c(
nrow(collar),
nrow(assay),
nrow(lithology),
nrow(missing_assay),
nrow(missing_collar),
nrow(interval_errors)
),
Status = c(
"✓", "✓", "✓",
ifelse(nrow(missing_assay) == 0, "✓", "⚠"),
ifelse(nrow(missing_collar) == 0, "✓", "⚠"),
ifelse(nrow(interval_errors) == 0, "✓", "⚠")
)
)
datatable(validation_summary,
options = list(dom = 't', ordering = FALSE),
caption = "Table 5: Data Validation Summary",
rownames = FALSE) %>%
formatStyle(
'Status',
color = styleEqual(c('✓', '⚠'), c('green', 'orange')),
fontWeight = 'bold'
)Best Practices for Data Validation
Documentation Requirements
For JORC compliance, document:
- Data Sources
- Who collected the data?
- When was it collected?
- What QA/QC protocols were followed in the field?
- Validation Process
- What checks were performed?
- What issues were found?
- How were issues resolved?
- Data Limitations
- Known gaps or uncertainties
- Data quality issues that couldn’t be resolved
- Impact on estimation confidence
Common Pitfalls to Avoid
- Rushing validation to meet deadlines - Always leads to problems later
- Assuming data is clean - Always validate, even from trusted sources
- Fixing issues without documentation - Record all changes for audit trail
- Ignoring “small” errors - Small errors compound in complex workflows
Integration and Data Merging
Once validation is complete, we can safely merge our datasets:
Kode
# Standardize column names
collar_std <- collar %>%
select(hole_id, x = x, y = y, z = rl) %>%
mutate(hole_id = as.character(hole_id))
assay_std <- assay %>%
select(hole_id, from, to, everything()) %>%
mutate(hole_id = as.character(hole_id))
lithology_std <- lithology %>%
select(hole_id, from, to, lithology) %>%
mutate(hole_id = as.character(hole_id))
# Merge data
combined_data <- assay_std %>%
left_join(collar_std, by = "hole_id") %>%
mutate(mid_point = from + (to - from) / 2) %>%
left_join(
lithology_std %>% rename(litho_from = from, litho_to = to),
by = join_by(hole_id, between(mid_point, litho_from, litho_to))
) %>%
select(-mid_point, -litho_from, -litho_to)
cat("Combined dataset rows:", nrow(combined_data), "\n")Combined dataset rows: 1090
Kode
cat("Columns:", paste(names(combined_data), collapse = ", "), "\n")Columns: hole_id, from, to, au_ppm, ag_ppm, cu_pct, x, y, z, lithology
Preview Combined Data
Kode
datatable(head(combined_data, 50),
options = list(
pageLength = 10,
scrollX = TRUE,
scrollY = "400px"
),
caption = "Table 6: Combined Dataset Preview") %>%
formatRound(columns = c('from', 'to', 'x', 'y', 'z'), digits = 2)Checklist: Before Moving to Pillar 2
Before proceeding to spatial analysis, ensure:
With clean, validated data, you’re ready to explore spatial patterns in Part 2: Spatial & Statistical Analysis.
Summary
Data validation is the foundation of reliable resource estimation. Key takeaways:
- Never skip validation - It’s mandatory for JORC compliance
- Check everything - Files, intervals, cross-references
- Document thoroughly - Create audit trails
- Fix issues early - Problems compound downstream
- Validate assumptions - Don’t trust data blindly
Remember the GIGO principle: Quality data is the only path to quality models.
Tools and Resources
- GeoDataViz: GitHub Repository
- JORC Code: 2012 Edition
- Contact: ghoziankarami@gmail.com